Failure Detection vs Group Membership in Fault-Tolerant Distributed Systems: Hidden Trade-Offs
نویسنده
چکیده
Failure detection and group membership are two important components of fault-tolerant distributed systems. Understanding their role is essential when developing efficient solutions, not only in failure-free runs, but also in runs in which processes do crash. While group membership provides consistent information about the status of processes in the system, failure detectors provide inconsistent information. This paper discusses the trade-offs related to the use of these two components, and clarifies their roles using three examples. The first example shows a case where group membership may favourably be replaced by a failure detection mechanism. The second example illustrates a case where group membership is mandatory. Finally, the third example shows a case where neither group membership nor failure detectors are needed (they may be replaced by weak ordering oracles).
منابع مشابه
Node Failure Detection and Membership in CANELy
Fault-tolerant distributed systems based on fieldbuses may benefit to a great extent from the availability of semantically rich communication services, such as those provided by group communication, clock synchronization, membership and failure detection. This is specially true of distributed critical control applications. However, the migration of those services to the realm of simple fieldbus...
متن کاملFailure Detection in Asynchronous Distributed Systems
Being able to detect failures is an important issue in designing fault-tolerant distributed systems. However, the actual behaviour of a system limits the ability to provide such a mechanism. From one extreme of the spectrum, synchronous systems (i.e., with bounded message transmission delay and processing times) allow for the construction of perfect failure detection based simply on local timeo...
متن کاملComparison of Failure Detectors and Group Membership: Performance Study of Two Atomic Broadcast Algorithms
Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we present a performance evaluation methodology that can be generalized to analyze many kinds of fault-tolerant alg...
متن کاملA Leader Election Protocol for Fault Recovery in Asynchronous Fully-Connected Networks
We introduce a new algorithm for consistent failure detection in asynchronous systems. Informally, consistent failure detection requires processes in a distributed system to distinguish between two diierent populations: a fault free population and a faulty one. The major contribution of this paper is in combining ideas from group membership and leader election, in order to have an election prot...
متن کاملArchitecture and Protocols for Fault-Tolerant Distributed Objects
We present an architecture and protocols for distributed fault-tolerant objects. The architecture and protocols are based on a novel object-centric formulation of the reliable group multicast problem which we refer to as the Group State Synchronization Problem (GSSP). This formulation allows us to capture and represent fault-tolerance semantics at the object level. The GSSP formulation has thre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002